Fix CUDA EP: add opset 24 kernel registrations for Reshape and Cast by justinchuby · Pull Request #28368 · microsoft/onnxruntime

justinchuby · 2026-05-05T18:48:17Z

ONNX opset 24 bumped Reshape and Cast (added float8e8m0 type). ORT CUDA EP only had opset 23 registrations, causing these ops to fall to CPUExecutionProvider on opset 24 models — producing ~280 memcpy nodes.

Fix: Version opset 23 registrations to (23, 23) and add non-versioned opset 24 registrations. Same kernel code.

Result: 282 memcpy → 4 memcpy for opset 24 models.

Tested with Gemma4 E2B-it (2B, opset 24) on H200.

ONNX opset 24 bumped Reshape and Cast (added float8e8m0 type support). ORT CUDA EP only had opset 23 registrations, so these ops fell to CPUExecutionProvider on opset 24 models, producing ~280 MemcpyFromHost/MemcpyToHost nodes. Version existing opset 23 registrations to (23, 23) and add new non-versioned opset 24 registrations. Same kernel implementations. Result: 282 memcpy → 4 memcpy for opset 24 models on CUDA EP. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>

tianleiwu · 2026-05-05T19:13:09Z

Overlap with #27742 and #27744

justinchuby · 2026-05-05T21:02:10Z

Will close. LMK when they can be merged?

justinchuby mentioned this pull request May 5, 2026

Fix CUDA EP: add opset 24 kernel registrations for Reshape/Cast + CUTLASS alignment #28366

Closed

justinchuby mentioned this pull request May 5, 2026

Fill CUDA Cast operator opset gap: extend registration from opset 23 to 25 #27744

Merged

justinchuby closed this May 5, 2026

justinchuby mentioned this pull request May 6, 2026

Fix CUDA EP: opset 24 kernel registrations + CUTLASS alignment + MEA dispatch #28365

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CUDA EP: add opset 24 kernel registrations for Reshape and Cast#28368

Fix CUDA EP: add opset 24 kernel registrations for Reshape and Cast#28368
justinchuby wants to merge 1 commit intomainfrom
fix-cuda-opset24-reshape-cast

justinchuby commented May 5, 2026

Uh oh!

tianleiwu commented May 5, 2026

Uh oh!

justinchuby commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

justinchuby commented May 5, 2026

Uh oh!

tianleiwu commented May 5, 2026

Uh oh!

justinchuby commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants